NBA PCA Analysis

Looking at similarities between NBA players from the 2015-2016 season

Roupen Khanjian
01-25-2021
Code
library(tidyverse) # Easily Install and Load the 'Tidyverse', CRAN v1.3.0
library(janitor) # Simple Tools for Examining and Cleaning Dirty Data, CRAN v2.1.0
library(here) # A Simpler Way to Find Your Files, CRAN v1.0.1
library(scales) # Scale Functions for Visualization, CRAN v1.1.1
library(ggfortify) # Data Visualization Tools for Statistical Analysis Results, CRAN v0.4.11
library(gghighlight) # Highlight Lines and Points in 'ggplot2', CRAN v0.3.1
library(plotly) # Create Interactive Web Graphics via 'plotly.js', CRAN v4.9.3
library(gt) # Easily Create Presentation-Ready Display Tables, CRAN v0.2.2 

Brief Introduction to Data

The data used for this task was obtained from the following link: data. I decided to analyze data from the National Basketball Association (NBA) player statistics from the 2015-2016 season. Each observation in this dataset is a player’s per game statistics. I choose to use PCA in order to see how the players differed across 11 features that are deemed to be important for a basketball player’s success.

Data Wrangling and PCA

Code
nba_players <- read_csv(here("_texts", 
                             "NBA_PCA",
                             "data", "nba_players.csv")) %>% 
  clean_names() %>% 
  separate(player, into = c("player", "html"), sep = "\\\\") %>% # clean the player name column
  dplyr::filter(mp > 18) %>% # filter for players who played over 18 minutes a game (out of a possible 48)
  dplyr::filter(g > 30) %>% # filter for players who played over 30 games (out of a possible 82)
  drop_na(age, fga, e_fg_percent, ft_percent, trb:pts)  # drop observations with missing values 

nba_players_pca <-  nba_players %>%  
  dplyr::select(age, fga, e_fg_percent, ft_percent, trb:pts) %>% # select the features for pca
  scale() %>% # scale the features
  prcomp() # run pca

# Quick look at the data
nba_players %>%
  dplyr::select(player, pos, age, fga, e_fg_percent, ft_percent, trb:pts) %>% 
  filter(player %in% sample(player, size = 5)) %>% 
  gt() %>% 
    tab_header(
      title = "Statistics from a Random Sample of Five Players",
      subtitle = "From the 2015-2016 NBA regular season"
    ) %>% 
    fmt_percent(
      columns = vars(e_fg_percent, ft_percent),
      decimals = 1
    ) %>% 
  tab_style(
    style = list(
      cell_text(style = "italic"),
      cell_borders(
        side = c("right"), 
        color = "black",
        weight = px(2)
        )
    ),
    locations = cells_body(
      columns = 1
    ))  %>% 
  cols_label(
    pos = "position"
  ) 
Statistics from a Random Sample of Five Players
From the 2015-2016 NBA regular season
player position age fga e_fg_percent ft_percent trb ast stl blk tov pf pts
Trevor Ariza SF 30 10.6 52.3% 78.3% 4.5 2.3 2.0 0.3 1.4 2.2 12.7
Patrick Beverley PG 27 8.4 53.9% 68.2% 3.5 3.4 1.3 0.4 1.3 3.3 9.9
Bojan Bogdanović SF 26 9.5 51.9% 83.3% 3.2 1.3 0.4 0.1 1.5 1.5 11.2
JaMychal Green PF 25 6.3 48.0% 75.2% 4.8 0.9 0.6 0.4 1.1 2.4 7.4
Mirza Teletović PF 30 9.8 54.4% 77.4% 3.8 1.1 0.4 0.3 1.1 2.0 12.2

Biplot

Code
autoplot(nba_players_pca,
         data = nba_players,
         loadings = TRUE,
         loadings.label = TRUE,
         loadings.colour = "khaki2",
         loadings.label.colour = "black",
         loadings.label.fontface = "bold",
         colour = "pos" # organize colors based off position
         ) +
  labs(title = "Biplot for PCA",
       caption = "Biplot of NBA players basic statistics 
       from the 2015-2016 NBA season.\n Colors are organized by position.",
       colour = "Position") +
  theme_minimal() +
  theme(axis.title = element_text(face = "bold", size = 12),
        panel.grid.minor = element_blank(),
        plot.title = element_text(face = "bold", size = 13)
        )

A few observations from the above biplot:

Biplot Highlighting a Few Players

Below is the same biplot but I decided to highlight the 5 best players for that season (according to the MVP voting which can be found here: MVP voting) .

Code
autoplot(nba_players_pca,
         data = nba_players,
         loadings = TRUE,
         loadings.label = TRUE,
         loadings.colour = "khaki2",
         loadings.label.colour = "black",
         loadings.label.fontface = "bold",
         colour = "player"
         ) +
  labs(title = "Biplot for PCA",
       subtitle = "Top 5 players in MVP Voting are Highlighted",
       caption = "Biplot highlighting some of the best players for the 2015-2016 NBA season") +
  gghighlight(player %in% c("Kawhi Leonard", "Stephen Curry", "LeBron James",
                            "Russell Westbrook", "Kevin Durant")) + # top 5 players in MVP voting
  theme_minimal() +
  theme(axis.title = element_text(face = "bold", size = 12),
        panel.grid.minor = element_blank(),
        plot.title = element_text(face = "bold", size = 13),
        plot.subtitle = element_text(size = 11)
        )

Biplot Using plotly to see Similarities Between Players

Lastly, in order to see which players are similar to one another I made an interactive plot where you can hover over each data point to revel the name of the player.

Code
nba_pca_plot <- autoplot(nba_players_pca,
         data = nba_players,
         loadings = TRUE,
         loadings.label = TRUE,
         loadings.colour = "khaki2",
         loadings.label.colour = "black",
         loadings.label.fontface = "bold",
         colour = "player", # organize colors based off position,
         colour.show.legend = FALSE
         ) +
  labs(title = "Interactive Biplot") +
  theme_minimal() +
  theme(axis.title = element_text(face = "bold", size = 12),
        panel.grid.minor = element_blank(),
        legend.position="none",
        plot.title = element_text(face = "bold", size = 13)
        )

ggplotly(nba_pca_plot, tooltip = "player") # interactive plot